quollr: An R Package for Visalizing 2D Models in High Dimensional Space

An abstract of less than 150 words.

Jayani P.G. Lakshika https://jayanilakshika.netlify.app/ (Monash University) , Dianne Cook http://www.dicook.org/ (Monash University) , Paul Harrison (Monash University) , Michael Lydeamore (Monash University) , Thiyanga S. Talagala https://thiyanga.netlify.app/ (University of Sri Jayewardenepura)
2024-02-17
#library(quollr)
library(readr)
library(ggplot2)
library(dplyr)
library(ggbeeswarm)
library(Rtsne)
library(umap)
library(phateR)
library(reticulate)
library(rsample)

set.seed(20230531)

use_python("~/miniforge3/envs/pcamp_env/bin/python")
use_condaenv("pcamp_env")

reticulate::source_python(paste0(here::here(), "/scripts/function_scripts/Fit_PacMAP_code.py"))
reticulate::source_python(paste0(here::here(), "/scripts/function_scripts/Fit_TriMAP_code.py"))

1 Introduction

2 Methodology

2.1 Usage

library(tools)
package_dependencies("quollr")

Compute hexagonal bin configurations

num_bins_x <- calculate_effective_x_bins(.data = s_curve_noise_umap, x = "UMAP1", hex_size = NA)
num_bins_x
[1] 4
num_bins_y <- calculate_effective_y_bins(.data = s_curve_noise_umap, y = "UMAP2", hex_size = NA)
num_bins_y
[1] 8

Generate full hex grid

Generating full hexagonal grid contains main three steps:

  1. Generate all the hexagonal bin centroids

Steps:

cell_area <- 1

hex_size <- sqrt(2 * cell_area / sqrt(3))

buffer_size <- hex_size/2

x_bounds <- seq(min(s_curve_noise_umap[["UMAP1"]]) - buffer_size,
                  max(s_curve_noise_umap[["UMAP1"]]) + buffer_size, length.out = num_bins_x)

y_bounds <- seq(min(s_curve_noise_umap[["UMAP2"]]) - buffer_size,
                max(s_curve_noise_umap[["UMAP2"]]) + buffer_size, length.out = num_bins_y)

box_points <- expand.grid(x = x_bounds, y = y_bounds)

ggplot() +
  geom_point(data = box_points, aes(x = x, y = y), color = "red")

 box_points <- box_points |>
    dplyr::arrange(x) |>
    dplyr::group_by(x) |>
    dplyr::group_modify(~ generate_even_y(.x)) |>
    tibble::as_tibble()

ggplot() +
  geom_point(data = box_points,
             aes(x = x, y = y, colour = as.factor(is_even)))

## Shift for even values in x-axis
x_shift <- unique(box_points$x)[2] - unique(box_points$x)[1]


box_points$x <- box_points$x + x_shift/2 * ifelse(box_points$is_even == 1, 1, 0)

ggplot() +
  geom_point(data = box_points, aes(x = x, y = y), color = "red")

all_centroids_df <- generate_full_grid_centroids(nldr_df = s_curve_noise_umap, 
                                                 x = "UMAP1", y = "UMAP2", 
                                                 num_bins_x = num_bins_x, 
                                                 num_bins_y = num_bins_y, 
                                                 buffer_size = NA, hex_size = NA)

glimpse(all_centroids_df)
Rows: 32
Columns: 2
$ x <dbl> -3.8076427, -2.6742223, -3.8076427, -2.6742223, -3.8076427…
$ y <dbl> -6.2798254, -4.4744481, -2.6690708, -0.8636935, 0.9416838,…
  1. Generate hexagonal coordinates

Steps: - Compute horizontal width of the hexagon

hex_grid <- gen_hex_coordinates(all_centroids_df, hex_size = NA)
glimpse(hex_grid)
Rows: 192
Columns: 3
$ x  <dbl> -2.674222, -2.674222, -3.807643, -4.941063, -4.941063, -3…
$ y  <dbl> -5.6804828, -6.8791681, -7.4785108, -6.8791681, -5.680482…
$ id <int> 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, …
ggplot(data = hex_grid, aes(x = x, y = y)) + geom_polygon(fill = "white", color = "black", aes(group = id)) +
  geom_point(data = all_centroids_df, aes(x = x, y = y), color = "red")

  1. Map hexagonal IDs

Steps:

full_grid_with_hexbin_id <- map_hexbin_id(all_centroids_df)

ggplot(data = hex_grid, aes(x = x, y = y)) + geom_polygon(fill = "white", color = "black", aes(group = id)) +
  geom_text(data = full_grid_with_hexbin_id, aes(x = c_x, y = c_y, label = hexID))

  1. Map polygon IDs

Steps:

full_grid_with_polygon_id <- map_polygon_id(full_grid_with_hexbin_id, hex_grid)
  1. Assign data into hexagons
s_curve_noise_umap_with_id <- assign_data(s_curve_noise_umap, full_grid_with_hexbin_id)
  1. Compute standardized counts
df_with_std_counts <- compute_std_counts(nldr_df = s_curve_noise_umap_with_id)
  1. Extract full grid info
hex_full_count_df <- generate_full_grid_info(full_grid_with_polygon_id, df_with_std_counts, hex_grid)
ggplot(data = hex_grid, aes(x = x, y = y)) + geom_polygon(fill = "white", color = "black", aes(group = id)) +
  geom_point(data = s_curve_noise_umap, aes(x = UMAP1, y = UMAP2), color = "blue")

ggplot(data = hex_full_count_df, aes(x = x, y = y)) +
  geom_polygon(color = "black", aes(group = polygon_id, fill = std_counts)) +
  geom_text(aes(x = c_x, y = c_y, label = hexID)) +
  scale_fill_viridis_c(direction = -1, na.value = "#ffffff")

Buffer size

When generating hexagonal bins in R, a buffer is often included to ensure that the data points are evenly distributed within the bins and to prevent edge effects. The buffer helps in two main ways:

  1. Preventing Edge Effects: Without a buffer, the outermost data points might fall near the boundary of the hexagonal grid, leading to incomplete bins or uneven distribution of data. By adding a buffer, you create a margin around the outer edges of the grid, ensuring that all data points are fully enclosed within the bins.

  2. Ensuring Even Distribution: The buffer allows for a smoother transition between adjacent bins. This helps in cases where data points are not perfectly aligned with the grid lines, ensuring that each data point is assigned to the nearest bin without bias towards any specific direction.

Overall, including a buffer when generating hexagonal bins helps to produce more accurate and robust binning results, particularly when dealing with real-world data that may have irregular distributions or boundary effects.

Construct the 2D model with different options

Construct the high-D model with different options

## To generate a data set with high-D and 2D training data
df_all <- training_data |> dplyr::select(-ID) |>
  dplyr::bind_cols(s_curve_noise_umap_with_id)

## To generate averaged high-D data

df_bin <- avg_highD_data(.data = df_all, column_start_text = "x") ## Need to pass ID column name

Generate the triangular mesh

df_bin_centroids <- hex_full_count_df[complete.cases(hex_full_count_df[["std_counts"]]), ] |>
  dplyr::select("c_x", "c_y", "hexID", "std_counts") |>
  dplyr::distinct() |>
  dplyr::rename(c("x" = "c_x", "y" = "c_y"))
  
df_bin_centroids
            x          y hexID std_counts
1  -2.6742223 -4.4744481     5     1.0000
2  -1.5408019 -6.2798254     2     0.3125
3  -0.4073814 -4.4744481     6     0.0625
4  -1.5408019 -2.6690708    10     0.2500
5  -0.4073814 -0.8636935    14     0.5000
6   0.7260390 -2.6690708    11     0.1250
7   1.8594594 -0.8636935    15     0.1875
8   0.7260390  0.9416838    19     0.6250
9   1.8594594  2.7470611    23     0.2500
10  0.7260390  4.5524384    27     0.5625
11  1.8594594  6.3578158    31     0.3750
12  2.9928798  4.5524384    28     0.4375
tr1_object <- triangulate_bin_centroids(df_bin_centroids, x, y)
tr_from_to_df <- generate_edge_info(triangular_object = tr1_object)

Compute parameter defaults

Shift the hexagonal grid origin

If shift_x happen to the positive direction of x it should input as a positive value, if not other way If shift_y happen to the positive direction of y it should input as a positive value, if not other way

  1. Assign shift along the x and y axis (limited the amount should less than the cell_diameter)

  2. Generate bounds with shift origin

all_centroids_df_shift <- extract_coord_of_shifted_hex_grid(nldr_df = s_curve_noise_umap, 
                                                 x = "UMAP1", y = "UMAP2", 
                                                 num_bins_x = num_bins_x, 
                                                 num_bins_y = num_bins_y,
                                                 shift_x = 0.2690002, shift_y = 0.271183,
                                                 buffer_size = NA, hex_size = NA)

glimpse(all_centroids_df_shift)
Rows: 32
Columns: 2
$ x <dbl> -3.5386425, -2.4052221, -3.5386425, -2.4052221, -3.5386425…
$ y <dbl> -6.0086424, -4.2032651, -2.3978878, -0.5925105, 1.2128668,…
hex_grid <- gen_hex_coordinates(all_centroids_df_shift)
glimpse(hex_grid)
Rows: 192
Columns: 3
$ x  <dbl> -2.405222, -2.405222, -3.538643, -4.672063, -4.672063, -3…
$ y  <dbl> -5.409299776, -6.607985117, -7.207327787, -6.607985117, -…
$ id <int> 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, …
ggplot(data = hex_grid, aes(x = x, y = y)) + geom_polygon(fill = "white", color = "black", aes(group = id)) +
  geom_point(data = all_centroids_df_shift, aes(x = x, y = y), color = "red")

full_grid_with_hexbin_id <- map_hexbin_id(all_centroids_df_shift)

ggplot(data = hex_grid, aes(x = x, y = y)) + geom_polygon(fill = "white", color = "black", aes(group = id)) +
  geom_text(data = full_grid_with_hexbin_id, aes(x = c_x, y = c_y, label = hexID))

full_grid_with_polygon_id <- map_polygon_id(full_grid_with_hexbin_id, hex_grid)
s_curve_noise_umap_with_id <- assign_data(s_curve_noise_umap, full_grid_with_hexbin_id)
df_with_std_counts <- compute_std_counts(nldr_df = s_curve_noise_umap_with_id)
hex_full_count_df <- generate_full_grid_info(full_grid_with_polygon_id, df_with_std_counts, hex_grid)
ggplot(data = hex_grid, aes(x = x, y = y)) + geom_polygon(fill = "white", color = "black", aes(group = id)) +
  geom_point(data = s_curve_noise_umap, aes(x = UMAP1, y = UMAP2), color = "blue")

ggplot(data = hex_full_count_df, aes(x = x, y = y)) +
  geom_polygon(color = "black", aes(group = polygon_id, fill = std_counts)) +
  geom_text(aes(x = c_x, y = c_y, label = hexID)) +
  scale_fill_viridis_c(direction = -1, na.value = "#ffffff")

df_bin_centroids <- hex_full_count_df[complete.cases(hex_full_count_df[["std_counts"]]), ] |>
  dplyr::select("c_x", "c_y", "hexID", "std_counts") |>
  dplyr::distinct() |>
  dplyr::rename(c("x" = "c_x", "y" = "c_y"))

df_bin_centroids
            x          y hexID std_counts
1  -3.5386425 -6.0086424     1 0.21428571
2  -2.4052221 -4.2032651     5 1.00000000
3  -1.2718017 -6.0086424     2 0.42857143
4  -0.1383812 -4.2032651     6 0.07142857
5  -1.2718017 -2.3978878    10 0.21428571
6  -0.1383812 -0.5925105    14 0.50000000
7   0.9950392 -2.3978878    11 0.28571429
8   2.1284596 -0.5925105    15 0.14285714
9   0.9950392  1.2128668    19 0.64285714
10  2.1284596  3.0182441    23 0.35714286
11  0.9950392  4.8236214    27 1.00000000
12  2.1284596  6.6289988    31 0.07142857
13  3.2618800  4.8236214    28 0.42857143
tr1_object <- triangulate_bin_centroids(df_bin_centroids, x, y)
tr_from_to_df <- generate_edge_info(triangular_object = tr1_object)
bin_centroids_shift <- ggplot(data = hex_full_count_df, aes(x = c_x, y = c_y)) +
  geom_point(color = "#bdbdbd") +
  geom_point(data = shifted_hex_coord_df, aes(x = c_x, y = c_y), color = "#feb24c") +
  coord_cartesian(xlim = c(-5, 8), ylim = c(-10, 10)) +
  theme_void() +
  theme(legend.position="none", legend.direction="horizontal", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
        axis.title.x = element_blank(), axis.title.y = element_blank(),
        axis.text.x = element_blank(), axis.ticks.x = element_blank(),
        axis.text.y = element_blank(), axis.ticks.y = element_blank(),
        panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=8), #change legend title font size
        legend.text = element_text(size=6)) +
  guides(fill = guide_colourbar(title = "Standardized count")) +
  annotate(geom = 'text', label = "a", x = -Inf, y = Inf, hjust = -0.3, vjust = 1, size = 3) 

hex_grid_shift <- ggplot(data = shifted_hex_coord_df, aes(x = x, y = y)) +
  geom_polygon(fill = NA, color = "#feb24c", aes(group = polygon_id)) +
  geom_polygon(data = hex_full_count_df, aes(x = x, y = y, group = polygon_id),
               fill = NA, color = "#bdbdbd") +
  coord_cartesian(xlim = c(-5, 8), ylim = c(-10, 10)) +
  theme_void() +
  theme(legend.position="none", legend.direction="horizontal", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
        axis.title.x = element_blank(), axis.title.y = element_blank(),
        axis.text.x = element_blank(), axis.ticks.x = element_blank(),
        axis.text.y = element_blank(), axis.ticks.y = element_blank(),
        panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=8), #change legend title font size
        legend.text = element_text(size=6)) +
  guides(fill = guide_colourbar(title = "Standardized count")) +
  annotate(geom = 'text', label = "b", x = -Inf, y = Inf, hjust = -0.3, vjust = 1, size = 3) 

## Before shift
before_shift_plot <- ggplot(data = hex_full_count_df, aes(x = x, y = y)) +
  geom_polygon(color = "black", aes(group = polygon_id, fill = std_counts)) +
  geom_text(aes(x = c_x, y = c_y, label = hexID), size = 2) +
  scale_fill_viridis_c(direction = -1, na.value = "#ffffff", option = "C") +
  coord_equal() +
  theme_void() +
  theme(legend.position="bottom", legend.direction="horizontal", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
        axis.title.x = element_blank(), axis.title.y = element_blank(),
        axis.text.x = element_blank(), axis.ticks.x = element_blank(),
        axis.text.y = element_blank(), axis.ticks.y = element_blank(),
        panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=8), #change legend title font size
        legend.text = element_text(size=6)) +
  guides(fill = guide_colourbar(title = "Standardized count")) +
  annotate(geom = 'text', label = "a", x = -Inf, y = Inf, hjust = -0.3, vjust = 1, size = 3) 


## After shift
after_shift_plot <- ggplot(data = shifted_hex_coord_df, aes(x = x, y = y)) +
  geom_polygon(color = "black", aes(group = polygon_id, fill = std_counts)) +
  geom_text(aes(x = c_x, y = c_y, label = hexID), size = 2) +
  scale_fill_viridis_c(direction = -1, na.value = "#ffffff", option = "C") +
  coord_equal() +
  theme_void() +
  theme(legend.position="none", legend.direction="horizontal", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
        axis.title.x = element_blank(), axis.title.y = element_blank(),
        axis.text.x = element_blank(), axis.ticks.x = element_blank(),
        axis.text.y = element_blank(), axis.ticks.y = element_blank(),
        panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=8), #change legend title font size
        legend.text = element_text(size=6)) +
  guides(fill = guide_colourbar(title = "Standardized count")) +
  annotate(geom = 'text', label = "b", x = -Inf, y = Inf, hjust = -0.3, vjust = 1, size = 3) 
Benchmark value to remove the low-density hexagons
## As an option first quantile considered as a default
benchmark_to_rm_lwd_hex <- quantile(df_bin_centroids$std_counts)[2] + 0.01

## To identify low density hexagons
df_bin_centroids_low <- df_bin_centroids |>
  dplyr::filter(std_counts <= benchmark_to_rm_lwd_hex)

## To identify low-density hexagons needed to remove by investigating neighbouring mean density
identify_rm_bins <- find_low_density_hexagons(df_bin_centroids_all = df_bin_centroids, num_bins_x = num_bins_x,
                     df_bin_centroids_low = df_bin_centroids_low)
Benchmark value to remove the long edges
## Compute 2D distances
distance <- cal_2d_dist(.data = tr_from_to_df)

## To plot the distribution of distance
plot_dist <- function(distance_df){
  distance_df$group <- "1"
  dist_plot <- ggplot(distance_df, aes(x = group, y = distance)) +
    geom_quasirandom()+
    ylim(0, max(unlist(distance_df$distance))+ 0.5) + coord_flip()
  return(dist_plot)
}

plot_dist(distance)
benchmark <- find_benchmark_value(.data = distance, distance_col = "distance")
benchmark <- 3

Model function

Predict 2D embeddings

Compute residuals

Visualizations

geom_trimesh
trimesh <- ggplot(df_bin_centroids, aes(x = x, y = y)) +
  geom_point(size = 0.1) +
  geom_trimesh() +
  coord_equal()

trimesh

coloured_long_edges
trimesh_gr <- colour_long_edges(.data = distance, benchmark_value = benchmark,
                                triangular_object = tr1_object, distance_col = "distance")

trimesh_gr

remove long edges
trimesh_removed <- remove_long_edges(.data = distance, benchmark_value = benchmark,
                                     triangular_object = tr1_object, distance_col = "distance")
trimesh_removed

show_langevitour
## To generate a data set with high-D and 2D training data
df_all <- training_data |> dplyr::select(-ID) |>
  dplyr::bind_cols(s_curve_noise_umap_with_id)

## To generate averaged high-D data

df_bin <- avg_highD_data(.data = df_all, column_start_text = "x") ## Need to pass ID column name
tour1 <- show_langevitour(df_all, df_bin, df_bin_centroids, benchmark_value = benchmark,
                          distance = distance, distance_col = "distance")
tour1

2.2 Tests

All functions have tests written and implemented using the testthat (Wickham 2011) in R.

3 Application

medlea_df <- read_csv("data/medlea_dataset.csv")
names(medlea_df)[2:(NCOL(medlea_df) - 1)] <- paste0("x", 1:(NCOL(medlea_df) - 2))

medlea_df <- medlea_df |> ## Since only contains zeros
  select(-x10)

#medlea_df[,2:(NCOL(medlea_df) - 1)] <- scale(medlea_df[,2:(NCOL(medlea_df) - 1)])

calculate_pca <- function(feature_dataset, num_pcs){
  pcaY_cal <- prcomp(feature_dataset, center = TRUE, scale = TRUE)
  PCAresults <- data.frame(pcaY_cal$x[, 1:num_pcs])
  summary_pca <- summary(pcaY_cal)
  var_explained_df <- data.frame(PC= paste0("PC",1:50),
                               var_explained=(pcaY_cal$sdev[1:50])^2/sum((pcaY_cal$sdev[1:50])^2))
  return(list(prcomp_out = pcaY_cal,pca_components = PCAresults, summary = summary_pca, var_explained_pca  = var_explained_df))
}
features <- medlea_df[,2:(NCOL(medlea_df) - 1)]
pca_ref_calc <- calculate_pca(features, 8) 
pca_ref_calc$summary
Importance of components:
                          PC1    PC2    PC3     PC4     PC5     PC6
Standard deviation     3.1691 3.0609 2.7226 1.87967 1.71219 1.34192
Proportion of Variance 0.1969 0.1837 0.1453 0.06928 0.05748 0.03531
Cumulative Proportion  0.1969 0.3806 0.5260 0.59526 0.65274 0.68805
                           PC7     PC8     PC9    PC10    PC11
Standard deviation     1.27525 1.16992 1.13465 1.06628 1.03279
Proportion of Variance 0.03189 0.02684 0.02524 0.02229 0.02091
Cumulative Proportion  0.71993 0.74677 0.77202 0.79431 0.81522
                          PC12    PC13   PC14   PC15   PC16    PC17
Standard deviation     0.97899 0.96264 0.9528 0.9116 0.9090 0.79750
Proportion of Variance 0.01879 0.01817 0.0178 0.0163 0.0162 0.01247
Cumulative Proportion  0.83402 0.85219 0.8700 0.8863 0.9025 0.91496
                          PC18    PC19    PC20    PC21   PC22    PC23
Standard deviation     0.76725 0.72414 0.65310 0.61052 0.6019 0.55399
Proportion of Variance 0.01154 0.01028 0.00836 0.00731 0.0071 0.00602
Cumulative Proportion  0.92650 0.93678 0.94514 0.95245 0.9596 0.96557
                          PC24    PC25    PC26   PC27    PC28    PC29
Standard deviation     0.52293 0.46638 0.41959 0.3976 0.34697 0.33415
Proportion of Variance 0.00536 0.00426 0.00345 0.0031 0.00236 0.00219
Cumulative Proportion  0.97093 0.97520 0.97865 0.9818 0.98411 0.98630
                          PC30    PC31    PC32    PC33    PC34
Standard deviation     0.30618 0.29237 0.28458 0.26033 0.25420
Proportion of Variance 0.00184 0.00168 0.00159 0.00133 0.00127
Cumulative Proportion  0.98814 0.98982 0.99140 0.99273 0.99400
                          PC35    PC36    PC37    PC38   PC39    PC40
Standard deviation     0.22792 0.21644 0.20437 0.19127 0.1744 0.15586
Proportion of Variance 0.00102 0.00092 0.00082 0.00072 0.0006 0.00048
Cumulative Proportion  0.99502 0.99594 0.99676 0.99747 0.9981 0.99855
                          PC41    PC42    PC43    PC44    PC45
Standard deviation     0.15252 0.12519 0.10485 0.08598 0.08008
Proportion of Variance 0.00046 0.00031 0.00022 0.00014 0.00013
Cumulative Proportion  0.99900 0.99931 0.99952 0.99967 0.99980
                          PC46    PC47    PC48    PC49    PC50
Standard deviation     0.06491 0.04841 0.04094 0.03791 0.02347
Proportion of Variance 0.00008 0.00005 0.00003 0.00003 0.00001
Cumulative Proportion  0.99988 0.99992 0.99996 0.99999 1.00000
                          PC51
Standard deviation     0.01421
Proportion of Variance 0.00000
Cumulative Proportion  1.00000
var_explained_df <- pca_ref_calc$var_explained_pca
data_pca <- pca_ref_calc$pca_components |>
  mutate(ID = 1:NROW(pca_ref_calc$pca_components),
         shape_label = medlea_df$Shape_label)

var_explained_df |>
  ggplot(aes(x = PC,y = var_explained, group = 1))+
  geom_point(size=1)+
  geom_line()+
  labs(title="Scree plot: PCA on scaled data") +
  scale_x_discrete(limits = paste0(rep("PC", 50), 1:50)) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
data_split <- initial_split(data_pca)
training_data <- training(data_split) |>
  arrange(ID)
test_data <- testing(data_split) |>
  arrange(ID)
UMAP_fit <- umap(training_data |> dplyr::select(-c(ID, shape_label)), n_neighbors = 37, n_components =  2)

UMAP_data <- UMAP_fit$layout |>
  as.data.frame()
names(UMAP_data)[1:(ncol(UMAP_data))] <- paste0(rep("UMAP",(ncol(UMAP_data))), 1:(ncol(UMAP_data)))

UMAP_data <- UMAP_data |>
  mutate(ID = training_data$id)

UMAP_data_with_label <- UMAP_data |>
  mutate(shape_label = training_data$shape_label)
UMAP_data_with_label |>
    ggplot(aes(x = UMAP1,
               y = UMAP2, color = shape_label))+
    geom_point(alpha=0.5) +
    coord_equal() +
    theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold")) + #ggtitle("(a)") +
  theme_linedraw() +
    theme(legend.position = "none", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
              axis.title.x = element_blank(), axis.title.y = element_blank(),
              axis.text.x = element_blank(), axis.ticks.x = element_blank(),
              axis.text.y = element_blank(), axis.ticks.y = element_blank(),
              panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=5), #change legend title font size
        legend.text = element_text(size=4),
         legend.key.height = unit(0.25, 'cm'),
         legend.key.width = unit(0.25, 'cm')) +
  scale_color_manual(values=c("#b15928", "#1f78b4", "#cab2d6", "#ccebc5", "#fb9a99", "#e31a1c", "#6a3d9a", "#ff7f00", "#ffed6f", "#fdbf6f", "#ffff99", "#a6cee3", "#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3", "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#b2df8a", "#bc80bd", "#33a02c", "#ccebc5", "#ffed6f", "#000000", "#bdbdbd"))

tSNE_data <- Fit_tSNE(training_data |> dplyr::select(-c(ID, shape_label)), opt_perplexity = calculate_effective_perplexity(training_data |> dplyr::select(-c(ID, shape_label))), with_seed = 20240110)

tSNE_data <- tSNE_data |>
  select(-ID) |>
  mutate(ID = training_data$ID)

tSNE_data_with_label <- tSNE_data |>
  mutate(shape_label = training_data$shape_label)

tSNE_data_with_label |>
    ggplot(aes(x = tSNE1,
               y = tSNE2, color = shape_label))+
    geom_point(alpha=0.5) +
    coord_equal() +
    theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold")) + #ggtitle("(a)") +
  theme_linedraw() +
    theme(legend.position = "none", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
              axis.title.x = element_blank(), axis.title.y = element_blank(),
              axis.text.x = element_blank(), axis.ticks.x = element_blank(),
              axis.text.y = element_blank(), axis.ticks.y = element_blank(),
              panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=5), #change legend title font size
        legend.text = element_text(size=4),
         legend.key.height = unit(0.25, 'cm'),
         legend.key.width = unit(0.25, 'cm')) +
  scale_color_manual(values=c("#b15928", "#1f78b4", "#cab2d6", "#ccebc5", "#fb9a99", "#e31a1c", "#6a3d9a", "#ff7f00", "#ffed6f", "#fdbf6f", "#ffff99", "#a6cee3", "#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3", "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#b2df8a", "#bc80bd", "#33a02c", "#ccebc5", "#ffed6f", "#000000", "#bdbdbd"))

PHATE_data <- Fit_PHATE(training_data |> dplyr::select(-c(ID, shape_label)), knn = 5, with_seed = 20240110)
Calculating PHATE...
  Running PHATE on 824 observations and 8 variables.
  Calculating graph and diffusion operator...
    Calculating KNN search...
    Calculating affinities...
  Calculated graph and diffusion operator in 0.01 seconds.
  Calculating optimal t...
    Automatically selected t = 24
  Calculated optimal t in 0.35 seconds.
  Calculating diffusion potential...
  Calculated diffusion potential in 0.29 seconds.
  Calculating metric MDS...
  Calculated metric MDS in 10.97 seconds.
Calculated PHATE in 11.62 seconds.
PHATE_data <- PHATE_data |>
  select(PHATE1, PHATE2)
PHATE_data <- PHATE_data |>
  mutate(ID = training_data$ID)

PHATE_data_with_label <- PHATE_data |>
  mutate(shape_label = training_data$shape_label)

PHATE_data_with_label |>
    ggplot(aes(x = PHATE1,
               y = PHATE2, color = shape_label))+
    geom_point(alpha=0.5) +
    coord_equal() +
    theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold")) + #ggtitle("(a)") +
  theme_linedraw() +
    theme(legend.position = "none", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
              axis.title.x = element_blank(), axis.title.y = element_blank(),
              axis.text.x = element_blank(), axis.ticks.x = element_blank(),
              axis.text.y = element_blank(), axis.ticks.y = element_blank(),
              panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=5), #change legend title font size
        legend.text = element_text(size=4),
         legend.key.height = unit(0.25, 'cm'),
         legend.key.width = unit(0.25, 'cm')) +
  scale_color_manual(values=c("#b15928", "#1f78b4", "#cab2d6", "#ccebc5", "#fb9a99", "#e31a1c", "#6a3d9a", "#ff7f00", "#ffed6f", "#fdbf6f", "#ffff99", "#a6cee3", "#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3", "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#b2df8a", "#bc80bd", "#33a02c", "#ccebc5", "#ffed6f", "#000000", "#bdbdbd"))

tem_dir <- tempdir()

Fit_TriMAP_data(training_data |> dplyr::select(-c(ID, shape_label)), tem_dir)

path <- file.path(tem_dir, "df_2_without_class.csv")
path2 <- file.path(tem_dir, "dataset_3_TriMAP_values.csv")

Fit_TriMAP(as.integer(2), as.integer(5), as.integer(4), as.integer(3), path, path2)

TriMAP_data <- read_csv(path2)
TriMAP_data <- TriMAP_data |>
  mutate(ID = training_data$ID)

TriMAP_data_with_label <- TriMAP_data |>
  mutate(shape_label = training_data$shape_label)

TriMAP_data_with_label |>
    ggplot(aes(x = TriMAP1,
               y = TriMAP2, color = shape_label))+
    geom_point(alpha=0.5) +
    coord_equal() +
    theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold")) + #ggtitle("(a)") +
  theme_linedraw() +
    theme(legend.position = "none", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
              axis.title.x = element_blank(), axis.title.y = element_blank(),
              axis.text.x = element_blank(), axis.ticks.x = element_blank(),
              axis.text.y = element_blank(), axis.ticks.y = element_blank(),
              panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=5), #change legend title font size
        legend.text = element_text(size=4),
         legend.key.height = unit(0.25, 'cm'),
         legend.key.width = unit(0.25, 'cm')) +
  scale_color_manual(values=c("#b15928", "#1f78b4", "#cab2d6", "#ccebc5", "#fb9a99", "#e31a1c", "#6a3d9a", "#ff7f00", "#ffed6f", "#fdbf6f", "#ffff99", "#a6cee3", "#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3", "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#b2df8a", "#bc80bd", "#33a02c", "#ccebc5", "#ffed6f", "#000000", "#bdbdbd"))

tem_dir <- tempdir()

Fit_PacMAP_data(training_data |> dplyr::select(-c(ID, shape_label)), tem_dir)

path <- file.path(tem_dir, "df_2_without_class.csv")
path2 <- file.path(tem_dir, "dataset_3_PaCMAP_values.csv")

Fit_PaCMAP(as.integer(2), as.integer(10), "random", 0.9, as.integer(2), path, path2)

PaCMAP_data <- read_csv(path2)
PaCMAP_data <- PaCMAP_data |>
  mutate(ID = training_data$ID)

PaCMAP_data_with_label <- PaCMAP_data |>
  mutate(shape_label = training_data$shape_label)

PaCMAP_data_with_label |>
    ggplot(aes(x = PaCMAP1,
               y = PaCMAP2, color = shape_label))+
    geom_point(alpha=0.5) +
    coord_equal() +
    theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold")) + #ggtitle("(a)") +
  theme_linedraw() +
    theme(legend.position = "none", plot.title = element_text(size = 7, hjust = 0.5, vjust = -0.5),
              axis.title.x = element_blank(), axis.title.y = element_blank(),
              axis.text.x = element_blank(), axis.ticks.x = element_blank(),
              axis.text.y = element_blank(), axis.ticks.y = element_blank(),
              panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #change legend key width
        legend.title = element_text(size=5), #change legend title font size
        legend.text = element_text(size=4),
         legend.key.height = unit(0.25, 'cm'),
         legend.key.width = unit(0.25, 'cm')) +
  scale_color_manual(values=c("#b15928", "#1f78b4", "#cab2d6", "#ccebc5", "#fb9a99", "#e31a1c", "#6a3d9a", "#ff7f00", "#ffed6f", "#fdbf6f", "#ffff99", "#a6cee3", "#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3", "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#b2df8a", "#bc80bd", "#33a02c", "#ccebc5", "#ffed6f", "#000000", "#bdbdbd"))

num_bins_x <- calculate_effective_x_bins(.data = tSNE_data, x = "tSNE1", hex_size = NA)
num_bins_y <- calculate_effective_y_bins(.data = tSNE_data, y = "tSNE2", hex_size = NA)
num_bins_y
[1] 34
all_centroids_df <- generate_full_grid_centroids(nldr_df = tSNE_data, 
                                                 x = "tSNE1", y = "tSNE2", 
                                                 num_bins_x = num_bins_x, 
                                                 num_bins_y = num_bins_y, 
                                                 buffer_size = NA, hex_size = NA)


hex_grid <- gen_hex_coordinates(all_centroids_df)

full_grid_with_hexbin_id <- map_hexbin_id(all_centroids_df)

full_grid_with_polygon_id <- map_polygon_id(full_grid_with_hexbin_id, hex_grid)

tSNE_data_with_id <- assign_data(tSNE_data, full_grid_with_hexbin_id)

df_with_std_counts <- compute_std_counts(nldr_df = tSNE_data_with_id)

hex_full_count_df <- generate_full_grid_info(full_grid_with_polygon_id, df_with_std_counts, hex_grid)

ggplot(data = hex_full_count_df, aes(x = x, y = y)) +
  geom_polygon(color = "black", aes(group = polygon_id, fill = std_counts)) +
  geom_text(aes(x = c_x, y = c_y, label = hexID)) +
  scale_fill_viridis_c(direction = -1, na.value = "#ffffff")

ggplot(data = hex_grid, aes(x = x, y = y)) + geom_polygon(fill = "white", color = "black", aes(group = id)) +
  geom_point(data = tSNE_data, aes(x = tSNE1, y = tSNE2), color = "blue")

df_bin_centroids <- hex_full_count_df[complete.cases(hex_full_count_df[["std_counts"]]), ] |>
  dplyr::select("c_x", "c_y", "hexID", "std_counts") |>
  dplyr::distinct() |>
  dplyr::rename(c("x" = "c_x", "y" = "c_y"))

df_bin_centroids
              x           y hexID std_counts
1   -27.4735071  -6.3511209   430 0.23076923
2   -27.4735071  -2.9680721   496 0.38461538
3   -25.5403904  -6.3511209   431 0.30769231
4   -26.5069487  -4.6595965   464 0.30769231
5   -25.5403904  -2.9680721   497 0.23076923
6   -26.5069487  -1.2765477   530 0.23076923
7   -23.6072737  -9.7341697   366 0.07692308
8   -24.5738320  -8.0426453   399 0.15384615
9   -23.6072737  -6.3511209   432 0.15384615
10  -24.5738320  -4.6595965   465 0.69230769
11  -24.5738320  -1.2765477   531 0.38461538
12  -24.5738320   2.1065011   597 0.15384615
13  -21.6741570  -9.7341697   367 0.30769231
14  -22.6407153  -8.0426453   400 0.38461538
15  -21.6741570  -6.3511209   433 0.30769231
16  -22.6407153  -4.6595965   466 0.07692308
17  -22.6407153  -1.2765477   532 0.23076923
18  -19.7410403  -9.7341697   368 0.23076923
19  -20.7075986  -8.0426453   401 0.30769231
20  -19.7410403  -6.3511209   434 0.15384615
21  -20.7075986  -4.6595965   467 0.07692308
22  -19.7410403  -2.9680721   500 0.07692308
23  -19.7410403   3.7980255   632 0.07692308
24  -17.8079236  -9.7341697   369 0.46153846
25  -18.7744819  -8.0426453   402 0.30769231
26  -17.8079236  -6.3511209   435 0.46153846
27  -18.7744819  -4.6595965   468 0.46153846
28  -17.8079236  -2.9680721   501 0.23076923
29  -18.7744819   2.1065011   600 0.38461538
30  -17.8079236   3.7980255   633 0.38461538
31  -18.7744819   5.4895499   666 0.23076923
32  -17.8079236   7.1810743   699 0.46153846
33  -15.8748069  -9.7341697   370 0.23076923
34  -16.8413652  -8.0426453   403 0.30769231
35  -15.8748069  -6.3511209   436 0.15384615
36  -16.8413652  -4.6595965   469 0.30769231
37  -15.8748069  -2.9680721   502 0.38461538
38  -15.8748069   0.4149767   568 0.07692308
39  -16.8413652   2.1065011   601 0.23076923
40  -15.8748069   3.7980255   634 0.23076923
41  -16.8413652   5.4895499   667 0.15384615
42  -15.8748069   7.1810743   700 0.15384615
43  -13.9416902  -9.7341697   371 0.38461538
44  -14.9082485  -8.0426453   404 0.23076923
45  -14.9082485  -4.6595965   470 0.15384615
46  -13.9416902  -2.9680721   503 0.23076923
47  -14.9082485  -1.2765477   536 0.23076923
48  -13.9416902   0.4149767   569 0.23076923
49  -14.9082485   2.1065011   602 0.07692308
50  -13.9416902   3.7980255   635 0.23076923
51  -14.9082485   5.4895499   668 0.30769231
52  -13.9416902   7.1810743   701 0.23076923
53  -14.9082485   8.8725987   734 0.07692308
54  -13.9416902  10.5641230   767 0.23076923
55  -12.0085735  -9.7341697   372 0.23076923
56  -12.9751318  -8.0426453   405 0.61538462
57  -12.9751318  -4.6595965   471 0.15384615
58  -12.0085735  -2.9680721   504 0.23076923
59  -12.9751318  -1.2765477   537 0.15384615
60  -12.0085735   0.4149767   570 0.38461538
61  -12.9751318   2.1065011   603 0.15384615
62  -12.0085735   3.7980255   636 0.23076923
63  -12.9751318   5.4895499   669 0.53846154
64  -12.0085735   7.1810743   702 0.38461538
65  -12.9751318   8.8725987   735 0.30769231
66  -11.0420151  -4.6595965   472 0.07692308
67  -10.0754568  -2.9680721   505 0.07692308
68  -11.0420151  -1.2765477   538 0.38461538
69  -10.0754568   0.4149767   571 0.38461538
70  -11.0420151   2.1065011   604 0.23076923
71  -11.0420151   5.4895499   670 0.07692308
72  -10.0754568   7.1810743   703 0.07692308
73  -11.0420151   8.8725987   736 0.38461538
74  -10.0754568  10.5641230   769 0.38461538
75  -11.0420151  12.2556474   802 0.07692308
76   -8.1423401  -2.9680721   506 0.07692308
77   -8.1423401   0.4149767   572 0.23076923
78   -9.1088984   2.1065011   605 0.15384615
79   -8.1423401   3.7980255   638 0.38461538
80   -9.1088984   5.4895499   671 0.23076923
81   -8.1423401   7.1810743   704 0.38461538
82   -9.1088984   8.8725987   737 0.15384615
83   -8.1423401  10.5641230   770 0.23076923
84   -9.1088984  12.2556474   803 0.15384615
85   -6.2092234  -2.9680721   507 0.15384615
86   -6.2092234   0.4149767   573 0.15384615
87   -7.1757817   2.1065011   606 0.07692308
88   -6.2092234   3.7980255   639 0.15384615
89   -7.1757817   5.4895499   672 0.23076923
90   -6.2092234   7.1810743   705 0.15384615
91   -6.2092234  10.5641230   771 0.15384615
92   -5.2426650  -4.6595965   475 0.07692308
93   -4.2761067  -2.9680721   508 0.38461538
94   -5.2426650  -1.2765477   541 0.15384615
95   -4.2761067   0.4149767   574 0.30769231
96   -4.2761067   3.7980255   640 0.15384615
97   -5.2426650   5.4895499   673 0.30769231
98   -4.2761067   7.1810743   706 0.15384615
99   -5.2426650   8.8725987   739 0.15384615
100  -4.2761067  10.5641230   772 0.38461538
101  -4.2761067  24.0963182  1036 0.07692308
102  -3.3095483  -4.6595965   476 0.07692308
103  -2.3429900  -2.9680721   509 0.30769231
104  -3.3095483  -1.2765477   542 0.53846154
105  -2.3429900   0.4149767   575 0.46153846
106  -2.3429900   3.7980255   641 0.23076923
107  -3.3095483   5.4895499   674 0.23076923
108  -2.3429900   7.1810743   707 0.30769231
109  -3.3095483   8.8725987   740 0.38461538
110  -2.3429900  10.5641230   773 0.07692308
111  -3.3095483  12.2556474   806 0.07692308
112  -2.3429900  20.7132694   971 0.23076923
113  -3.3095483  22.4047938  1004 0.46153846
114  -2.3429900  24.0963182  1037 0.38461538
115  -3.3095483  25.7878426  1070 0.07692308
116  -1.3764317  -1.2765477   543 0.07692308
117  -0.4098733   3.7980255   642 0.23076923
118  -1.3764317   5.4895499   675 0.07692308
119  -0.4098733   7.1810743   708 0.30769231
120  -1.3764317   8.8725987   741 0.38461538
121  -0.4098733  10.5641230   774 0.07692308
122  -1.3764317  19.0217450   939 0.15384615
123  -0.4098733  20.7132694   972 0.38461538
124  -1.3764317  22.4047938  1005 0.38461538
125  -1.3764317  25.7878426  1071 0.53846154
126  -0.4098733  27.4793670  1104 0.15384615
127   0.5566850   5.4895499   676 0.23076923
128   1.5232434   7.1810743   709 0.23076923
129   0.5566850   8.8725987   742 0.07692308
130   1.5232434  20.7132694   973 0.46153846
131   0.5566850  22.4047938  1006 0.69230769
132   1.5232434  24.0963182  1039 0.23076923
133   0.5566850  25.7878426  1072 0.53846154
134   1.5232434  27.4793670  1105 0.07692308
135   3.4563601 -26.6494136    50 0.30769231
136   3.4563601 -23.2663648   116 0.23076923
137   2.4898017 -21.5748405   149 0.07692308
138   3.4563601  -6.3511209   446 0.23076923
139   2.4898017  -1.2765477   545 0.15384615
140   3.4563601   3.7980255   644 0.15384615
141   2.4898017   5.4895499   677 0.15384615
142   3.4563601   7.1810743   710 0.15384615
143   2.4898017   8.8725987   743 0.38461538
144   3.4563601  10.5641230   776 0.07692308
145   2.4898017  22.4047938  1007 0.38461538
146   3.4563601  24.0963182  1040 0.07692308
147   2.4898017  25.7878426  1073 0.07692308
148   5.3894768 -26.6494136    51 0.84615385
149   5.3894768 -23.2663648   117 0.53846154
150   4.4229184 -21.5748405   150 0.84615385
151   5.3894768 -19.8833161   183 0.07692308
152   4.4229184 -18.1917917   216 0.07692308
153   5.3894768 -16.5002673   249 0.15384615
154   5.3894768  -9.7341697   381 0.15384615
155   4.4229184  -8.0426453   414 0.23076923
156   5.3894768  -6.3511209   447 0.15384615
157   4.4229184   2.1065011   612 0.07692308
158   5.3894768   3.7980255   645 0.30769231
159   4.4229184   8.8725987   744 0.07692308
160   6.3560351 -28.3409380    19 0.30769231
161   7.3225935 -26.6494136    52 0.53846154
162   6.3560351 -21.5748405   151 0.15384615
163   6.3560351 -18.1917917   217 0.84615385
164   7.3225935 -16.5002673   250 0.30769231
165   6.3560351  -8.0426453   415 0.23076923
166   7.3225935  -6.3511209   448 0.07692308
167   7.3225935  -2.9680721   514 0.53846154
168   6.3560351  -1.2765477   547 0.07692308
169   7.3225935   0.4149767   580 0.23076923
170   6.3560351   2.1065011   613 0.23076923
171   7.3225935  24.0963182  1042 0.76923077
172   8.2891518 -28.3409380    20 0.07692308
173   8.2891518 -14.8087429   284 0.15384615
174   8.2891518  -8.0426453   416 0.23076923
175   9.2557102  -6.3511209   449 0.15384615
176   8.2891518  -4.6595965   482 0.23076923
177   9.2557102  -2.9680721   515 0.53846154
178   8.2891518  -1.2765477   548 0.23076923
179   9.2557102   0.4149767   581 0.30769231
180   8.2891518   5.4895499   680 0.15384615
181   9.2557102   7.1810743   713 0.15384615
182   9.2557102  20.7132694   977 0.15384615
183   8.2891518  22.4047938  1010 0.53846154
184   9.2557102  24.0963182  1043 0.07692308
185   8.2891518  25.7878426  1076 0.07692308
186  10.2222685 -11.4256941   351 0.46153846
187  10.2222685  -8.0426453   417 0.30769231
188  11.1888269  -6.3511209   450 0.23076923
189  10.2222685  -4.6595965   483 0.30769231
190  11.1888269  -2.9680721   516 0.07692308
191  10.2222685  -1.2765477   549 0.30769231
192  11.1888269   0.4149767   582 0.30769231
193  10.2222685   2.1065011   615 0.23076923
194  10.2222685   5.4895499   681 0.23076923
195  11.1888269   7.1810743   714 0.23076923
196  10.2222685  22.4047938  1011 0.15384615
197  13.1219436  -9.7341697   385 0.38461538
198  12.1553852  -8.0426453   418 0.30769231
199  13.1219436  -6.3511209   451 0.15384615
200  12.1553852  -4.6595965   484 0.46153846
201  13.1219436  -2.9680721   517 0.30769231
202  12.1553852  -1.2765477   550 0.53846154
203  13.1219436   0.4149767   583 0.53846154
204  12.1553852   2.1065011   616 0.30769231
205  12.1553852   5.4895499   682 0.07692308
206  15.0550603  -9.7341697   386 0.07692308
207  14.0885019  -8.0426453   419 0.30769231
208  15.0550603  -6.3511209   452 0.23076923
209  14.0885019  -4.6595965   485 0.38461538
210  15.0550603  -2.9680721   518 0.30769231
211  14.0885019  -1.2765477   551 0.07692308
212  15.0550603   0.4149767   584 0.61538462
213  14.0885019   2.1065011   617 0.38461538
214  16.0216186 -14.8087429   288 0.07692308
215  16.9881770 -13.1172185   321 0.46153846
216  16.9881770  -9.7341697   387 0.30769231
217  16.0216186  -8.0426453   420 0.23076923
218  16.9881770  -6.3511209   453 0.07692308
219  16.0216186  -4.6595965   486 0.15384615
220  16.9881770   0.4149767   585 0.07692308
221  16.0216186   2.1065011   618 0.07692308
222  17.9547353 -14.8087429   289 0.46153846
223  17.9547353 -11.4256941   355 0.38461538
224  17.9547353  -4.6595965   487 0.07692308
225  18.9212937  -2.9680721   520 0.38461538
226  17.9547353   2.1065011   619 0.23076923
227  19.8878520 -11.4256941   356 0.46153846
228  20.8544104  -9.7341697   389 0.07692308
229  19.8878520  -1.2765477   554 0.30769231
230  21.8209687 -11.4256941   357 0.30769231
231  26.6537605   3.7980255   656 0.15384615
232  27.6203188  -1.2765477   558 0.15384615
233  28.5868771   0.4149767   591 0.61538462
234  27.6203188   2.1065011   624 0.23076923
235  29.5534355  -1.2765477   559 0.30769231
236  29.5534355   2.1065011   625 0.84615385
237  30.5199938   3.7980255   658 0.46153846
238  31.4865522   2.1065011   626 0.07692308
239  32.4531105   3.7980255   659 1.00000000
tr1_object <- triangulate_bin_centroids(df_bin_centroids, x, y)
tr_from_to_df <- generate_edge_info(triangular_object = tr1_object)
## To generate a data set with high-D and 2D training data
df_all <- training_data |> dplyr::select(-c(ID, shape_label)) |>
  dplyr::bind_cols(tSNE_data_with_id)

## To generate averaged high-D data

df_bin <- avg_highD_data(.data = df_all, column_start_text = "PC") ## Need to pass ID column name
## Compute 2D distances
distance <- cal_2d_dist(.data = tr_from_to_df)

plot_dist(distance)
benchmark <- find_benchmark_value(.data = distance, distance_col = "distance")
trimesh <- ggplot(df_bin_centroids, aes(x = x, y = y)) +
  geom_point(size = 0.1) +
  geom_trimesh() +
  coord_equal()

trimesh

trimesh_gr <- colour_long_edges(.data = distance, benchmark_value = benchmark,
                                triangular_object = tr1_object, distance_col = "distance")

trimesh_gr

trimesh_removed <- remove_long_edges(.data = distance, benchmark_value = benchmark,
                                     triangular_object = tr1_object, distance_col = "distance")
trimesh_removed

tour1 <- show_langevitour(df_all, df_bin, df_bin_centroids, benchmark_value = benchmark,
                          distance = distance, distance_col = "distance", column_start_text = "PC")
tour1

4 Conclusion

5 Acknowledgements

This article is created using knitr (Xie 2015) and rmarkdown (Xie et al. 2018) in R with the rjtools::rjournal_article template. The source code for reproducing this paper can be found at: https://github.com/JayaniLakshika/paper-quollr.

5.1 CRAN packages used

testthat, knitr, rmarkdown

5.2 CRAN Task Views implied by cited packages

ReproducibleResearch

H. Wickham. Testthat: Get started with testing. The R Journal, 3: 5–10, 2011. URL https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
Y. Xie. Dynamic documents with R and knitr. 2nd ed Boca Raton, Florida: Chapman; Hall/CRC, 2015. URL https://yihui.name/knitr/. ISBN 978-1498716963.
Y. Xie, J. J. Allaire and G. Grolemund. R markdown: The definitive guide. Boca Raton, Florida: Chapman; Hall/CRC, 2018. URL https://bookdown.org/yihui/rmarkdown. ISBN 978-1138359338.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Lakshika, et al., "quollr: An R Package for Visalizing 2D Models in High Dimensional Space", The R Journal, 2024

BibTeX citation

@article{paper-quollr,
  author = {Lakshika, Jayani P.G. and Cook, Dianne and Harrison, Paul and Lydeamore, Michael and Talagala, Thiyanga S.},
  title = {quollr: An R Package for Visalizing 2D Models in High Dimensional Space},
  journal = {The R Journal},
  year = {2024},
  issn = {2073-4859},
  pages = {1}
}